Reply to this topicStart new topic
> Google's architecture, Informative news story

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Mar 13 2006, 12:05 AM
Recently, one of Google's engineers spoke at EclipseCon 2005, and talked a little about the architecture behind how Google works. Urs Hoelzle gave some details that I think many folks haven't heard before on the computer system that Google uses to provide search results

Peeking Into Google


Here are a couple of snippets:

QUOTE
Google replicates the Web pages it caches by splitting them up into pieces it calls "shards." The shards are small enough that several can fit on one machine. And they're replicated on several machines, so that if one breaks, another can serve up the information. The master index is also split up among several servers, and that set also is replicated several times. The engineers call these "chunk servers."


and

QUOTE
The company also is applying machine learning to its system to give better results. Theoretically, he said, if someone searches for "Bay Area cooking class," the system should know that "Berkeley courses: vegetarian cuisine" is a good match even though it contains none of the query words.

To do this, the system tries to cluster concepts into "reasonably coherent" subclusters that seem related. These clusters, some tiny and some huge, are named automatically. Then, when a query comes in, the system produces a probability score for the various clusters. This kind of machine learning has had little success in academic trials, Hoelzle said, because they didn't have enough data. "If you have enough data, you get reasonably good answers out of it."


Nice to get a quick peek under the cover now and then.
Offline Go to the top of the page

Technical Administrator

Group Icon
Group: Technical Administrators
Joined: 8-March 06
Posts: 2,650
From: Minneapolis/Saint Paul, MN
post Mar 13 2006, 04:04 AM
Thanks for pointing that out! This was a very interesting read - Google is clearly taking load balancing up a notch. I've known for a while about their 'chunking' practice, but never knew any details.

QUOTE
One literal meltdown -- a fire at a datacenter in an undisclosed location -- brought out six fire trucks but didn't crash the system.


That's amazing! I've certainly found that Google is one of the most reliable sites I use.
Offline Go to the top of the page

Membership Admin & Moderator

Group Icon
Group: Membership Admin & Moderator
Joined: 30-September 05
Posts: 3,267
From: Some round-ish rock floating in a vacuum.
post Mar 13 2006, 04:30 AM
That's a great find Bill. Thanks for it smile.gif

One thing that keeps striking me about Google is that they use the cheapest hardware available ($1000 a pop) and glue it all together using very clever software. All this is custom-made which makes it harder for competitors to imitate Google. Sure other SEs can deal with large datasets, but they again have to re-invent the basic tools. In business jargo, the "barriers to entry" are quite high.

On a related question: why don't Yahoo and MSN do such presentations? They are excellent for marketing!
Offline Go to the top of the page

Quarter Grand Poster

Group: Members
Joined: 18-November 05
Posts: 410
From: Greater Washington DC area
post Mar 13 2006, 09:55 AM
Bill: You are an incredible source of information. Thanks.

As to clusters of phrases; I've seen some tools that suggest analogous words...but I suspect google is building that based on its enormous data base. Its interesting to hear that developing that requires enormous data. They are certainly the group to analyse data...and they are hiring enough sharp people to pick through this valuable information.

BTW: I noticed that Matt Cutts wanted to meet you at SES NYC. Did you get together? Your excellent work is spreading across the web.
Offline Go to the top of the page

Moderator Alumni

Group Icon
Group: Hall Of Fame
Joined: 31-August 02
Posts: 15,634
post Mar 13 2006, 12:08 PM
I did manage to meet up quickly with Matt at the end of one of the sessions that he spoke at, but only for a few minutes. We chatted briefly, and he told me that he liked what I was doing on my blog, which was nice to hear. wink.gif

About the clusters of data, this is what I thought was interesting:

QUOTE
This kind of machine learning has had little success in academic trials, Hoelzle said, because they didn't have enough data. "If you have enough data, you get reasonably good answers out of it."


Makes you wonder how much data is enough.

There are some other tidbits of information about Google's architecture on the web, including a number of video seminars. I haven't seen much about the actual architecture that houses the searches for Yahoo! or MSN or Ask. Might be interesting to look around to see if there is anything about those online.



Offline Go to the top of the page

Untested

Group: Members
Joined: 17-February 06
Posts: 7
post Mar 13 2006, 03:19 PM
The source material for all this talk about shards and clustering is this University of Washington CSE Colloquialism video featuring Google PhD Jeff Dean:

http://norfolk.cs.washington.edu/htbin-pos...56K_320x240.wmv

If you watch the entire video you discover that, at the end of the day, this is a employee recruitment session. Still, lots of good stuff.

There is a lengthy demonstration of the clustering technology that begins at 35:30

It is important to note that Dr. Dean differentiates between the way search engines work today and what the goal is. He calls the search tool a Demo and a Model and the tool is prominently marked DEMO

I am conjecturing that subsequent statements by Google employees are based on watching this video as part of their prep work.

ph34r.gif The Komodo Tale
Seattle, WA

This post has been edited by Komodo Tale: Mar 13 2006, 03:25 PM
Offline Go to the top of the page

Moderator

Group Icon
Group: Moderators
Joined: 6-March 03
Posts: 7,962
From: Langley, British Columbia, Canada
post Mar 13 2006, 03:42 PM
I only had time to watch part of that, Komodo Tale, but it was fascinating. Welcome to the Forums. wavey.gif

You've just got to tell us a little more about that cute name you have. unsure.gif
Offline Go to the top of the page
Fast ReplyReply to this topic Start new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:
Jump to Forum:
 
Lo-Fi Version Time is now: 9th February 2010 - 06:18 PM
Meet our Moderators: cre8pc : projectphp : sanity : Black Phoenix : bwelford : EGOL : Ruud : rustybrick : AbleReach : swainzy : joedolson: eKstreme: dazzlindonna : SEOigloo: iamlost : RisaBB
Cre8asite RSS Feed